The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.
translated by 谷歌翻译
在过去的十年中,AI AID毒品发现(AIDD)的计算方法和数据集策划的繁荣发展。但是,现实世界中的药物数据集经常表现出高度不平衡的分布,这在很大程度上被当前的文献忽略了,但可能会严重损害机器学习应用程序的公平性和概括。在这一观察结果的激励下,我们介绍了Imdrug,这是一个全面的基准标准,其开源python库由4个不平衡设置,11个AI-Ready数据集,54个学习任务和16种为不平衡学习量身定制的基线算法。它为涵盖广泛的药物发现管道(例如分子建模,药物靶标相互作用和逆合合成)的问题和解决方案提供了可访问且可定制的测试床。我们通过新的评估指标进行广泛的实证研究,以证明现有算法在数据不平衡情况下无法解决药物和药物挑战。我们认为,Imdrug为未来的研究和发展开辟了途径,在AIDD和深度不平衡学习的交集中对现实世界中的挑战开辟了道路。
translated by 谷歌翻译
尽管无监督的异常检测迅速发展,但现有的方法仍需要训练不同对象的单独模型。在这项工作中,我们介绍了完成具有统一框架的多个类别的异常检测。在如此具有挑战性的环境下,流行的重建网络可能属于“相同的快捷方式”,在这种捷径中,正常样本和异常样本都可以很好地恢复,因此无法发现异常值。为了解决这一障碍,我们取得了三个改进。首先,我们重新审视完全连接的层,卷积层以及注意力层的配方,并确认查询嵌入(即注意层内)在防止网络学习快捷键方面的重要作用。因此,我们提出了一个层的查询解码器,以帮助建模多级分布。其次,我们采用一个邻居掩盖的注意模块,以进一步避免从输入功能到重建的输出功能的信息泄漏。第三,我们提出了一种功能抖动策略,即使使用嘈杂的输入,也敦促模型恢复正确的消息。我们在MVTEC-AD和CIFAR-10数据集上评估了我们的算法,在该数据集中,我们通过足够大的利润率超过了最先进的替代方案。例如,当在MVTEC-AD中学习15个类别的统一模型时,我们在异常检测的任务(从88.1%到96.5%)和异常定位(从89.5%到96.8%)上超过了第二个竞争者。代码将公开可用。
translated by 谷歌翻译
在模型提取攻击中,对手可以通过反复查询并根据获得的预测来窃取通过公共API暴露的机器学习模型。为了防止模型窃取,现有的防御措施专注于检测恶意查询,截断或扭曲输出,因此必然会为合法用户引入鲁棒性和模型实用程序之间的权衡。取而代之的是,我们建议通过要求用户在阅读模型的预测之前完成工作证明来阻碍模型提取。这可以通过大大增加(甚至高达100倍)来阻止攻击者,以利用查询访问模型提取所需的计算工作。由于我们校准完成每个查询的工作证明所需的努力,因此这仅为常规用户(最多2倍)引入一个轻微的开销。为了实现这一目标,我们的校准应用了来自差异隐私的工具来衡量查询揭示的信息。我们的方法不需要对受害者模型进行任何修改,可以通过机器学习从业人员来应用其公开暴露的模型免于轻易被盗。
translated by 谷歌翻译
星际争霸II(SC2)是一个实时策略游戏,其中玩家生产和控制多个单位来对抗对手的单位。由于其困难,如巨大的国家空间,各种动作空间,长时间地平线和不完美的信息,SC2一直是加固学习的研究热点。最近,已经提出了一个称为阿尔巴斯塔(AS)的代理人,这表明了良好的性能,抵御人类球员的高胜率为99.8%。我们根据AS的纸张和伪代码实现了称为Mini-AlphaStar(MAS)的迷你缩放版本。AS和MAS之间的差异是,我们将与较小的培训培训的超参数替换为较小的参数。MAS的代码都是开放的(https://github.com/liuruoze/minia-alphastar),用于将来的研究。
translated by 谷歌翻译
In this paper, we address referring expression comprehension: localizing an image region described by a natural language expression. While most recent work treats expressions as a single unit, we propose to decompose them into three modular components related to subject appearance, location, and relationship to other objects. This allows us to flexibly adapt to expressions containing different types of information in an end-to-end framework. In our model, which we call the Modular Attention Network (MAttNet), two types of attention are utilized: languagebased attention that learns the module weights as well as the word/phrase attention that each module should focus on; and visual attention that allows the subject and relationship modules to focus on relevant image components. Module weights combine scores from all three modules dynamically to output an overall score. Experiments show that MAttNet outperforms previous state-of-the-art methods by a large margin on both bounding-box-level and pixel-level comprehension tasks. Demo 1 and code 2 are provided.
translated by 谷歌翻译
Figure 1: Example inpainting results of our method on images of natural scene, face and texture. Missing regions are shown in white. In each pair, the left is input image and right is the direct output of our trained generative neural networks without any post-processing.
translated by 谷歌翻译
As the quality of optical sensors improves, there is a need for processing large-scale images. In particular, the ability of devices to capture ultra-high definition (UHD) images and video places new demands on the image processing pipeline. In this paper, we consider the task of low-light image enhancement (LLIE) and introduce a large-scale database consisting of images at 4K and 8K resolution. We conduct systematic benchmarking studies and provide a comparison of current LLIE algorithms. As a second contribution, we introduce LLFormer, a transformer-based low-light enhancement method. The core components of LLFormer are the axis-based multi-head self-attention and cross-layer attention fusion block, which significantly reduces the linear complexity. Extensive experiments on the new dataset and existing public datasets show that LLFormer outperforms state-of-the-art methods. We also show that employing existing LLIE methods trained on our benchmark as a pre-processing step significantly improves the performance of downstream tasks, e.g., face detection in low-light conditions. The source code and pre-trained models are available at https://github.com/TaoWangzj/LLFormer.
translated by 谷歌翻译
This paper presents a comprehensive survey of low-light image and video enhancement. We begin with the challenging mixed over-/under-exposed images, which are under-performed by existing methods. To this end, we propose two variants of the SICE dataset named SICE_Grad and SICE_Mix. Next, we introduce Night Wenzhou, a large-scale, high-resolution video dataset, to address the issue of the lack of a low-light video dataset that discount the use of low-light image enhancement (LLIE) to videos. The Night Wenzhou dataset is challenging since it consists of fast-moving aerial scenes and streetscapes with varying illuminations and degradation. We conduct extensive key technique analysis and experimental comparisons for representative LLIE approaches using these newly proposed datasets and the current benchmark datasets. Finally, we address unresolved issues and propose future research topics for the LLIE community.
translated by 谷歌翻译